{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 下载llama.cpp"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "vscode": {
     "languageId": "latex"
    }
   },
   "source": [
    "source /etc/network_turbo\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "git clone https://github.com/ggerganov/llama.cpp.git\n",
    "\n",
    "cd llama.cpp\n",
    "\n",
    "pip install -r requirements.txt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 如果不量化，保留模型的效果\n",
    "python convert_hf_to_gguf.py /root/autodl-tmp/Qwen/Qwen3-8B  --outtype f16 --verbose --outfile /root/autodl-tmp/Qwen/qwen3_8b_f16.gguf\n",
    "### 如果需要量化（加速并有损效果），直接执行下面脚本就可以\n",
    "python convert_hf_to_gguf.py /root/autodl-tmp/Qwen/Qwen3-8B  --outtype q8_0 --verbose --outfile /root/autodl-tmp/Qwen/qwen3_8b_q8_0.gguf\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "这里--outtype是输出类型，代表含义：\n",
    "\n",
    "q2_k：特定张量（Tensor）采用较高的精度设置，而其他的则保持基础级别。\\\n",
    "q3_k_l、q3_k_m、q3_k_s：这些变体在不同张量上使用不同级别的精度，从而达到性能和效率的平衡。\\\n",
    "q4_0：这是最初的量化方案，使用 4 位精度。\\\n",
    "q4_1 和 q4_k_m、q4_k_s：这些提供了不同程度的准确性和推理速度，适合需要平衡资源使用的场景。\\\n",
    "q5_0、q5_1、q5_k_m、q5_k_s：这些版本在保证更高准确度的同时，会使用更多的资源并且推理速度较慢。\\\n",
    "q6_k 和 q8_0：这些提供了最高的精度，但是因为高资源消耗和慢速度，可能不适合所有用户。\\\n",
    "fp16 和 f32: 不量化，保留原始精度。\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 制作ModelFile文件"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "vscode": {
     "languageId": "latex"
    }
   },
   "outputs": [],
   "source": [
    "FROM /root/autodl-tmp/Qwen/qwen3_8b_q8_0.gguf\n",
    "\n",
    "# set the temperature to 0.7 [higher is more creative, lower is more coherent]\n",
    "PARAMETER temperature 0.7\n",
    "PARAMETER top_p 0.8\n",
    "PARAMETER repeat_penalty 1.05\n",
    "TEMPLATE \"\"\"{{ if .System }}<|im_start|>system\n",
    "{{ .System }}<|im_end|>\n",
    "{{ end }}{{ if .Prompt }}<|im_start|>user\n",
    "{{ .Prompt }}<|im_end|>\n",
    "{{ end }}<|im_start|>assistant\n",
    "{{ .Response }}<|im_end|>\"\"\"\n",
    "# set the system message\n",
    "SYSTEM \"\"\"\n",
    "You are a helpful assistant.\n",
    "\"\"\"\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "vscode": {
     "languageId": "latex"
    }
   },
   "source": [
    "ollama serve\n",
    "\n",
    "ollama create midori --file ./ModelFile\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "vscode": {
     "languageId": "latex"
    }
   },
   "source": [
    "ollama run Qwen3-8B-Q8"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "vscode": {
     "languageId": "latex"
    }
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
